Open Cell Research (2016) 26:21-33. npg 
ORIGINAL ARTICLE www.nature.com/cr 


Out of southern East Asia: the natural history of domestic 
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The origin and evolution of the domestic dog remains a controversial question for the scientific community, with 
basic aspects such as the place and date of origin, and the number of times dogs were domesticated, open to dispute. 
Using whole genome sequences from a total of 58 canids (12 gray wolves, 27 primitive dogs from Asia and Africa, and 
a collection of 19 diverse breeds from across the world), we find that dogs from southern East Asia have significantly 
higher genetic diversity compared to other populations, and are the most basal group relating to gray wolves, indi- 
cating an ancient origin of domestic dogs in southern East Asia 33 000 years ago. Around 15 000 years ago, a subset 
of ancestral dogs started migrating to the Middle East, Africa and Europe, arriving in Europe at about 10 000 years 
ago. One of the out of Asia lineages also migrated back to the east, creating a series of admixed populations with the 
endemic Asian lineages in northern China before migrating to the New World. For the first time, our study unravels 
an extraordinary journey that the domestic dog has traveled on earth. 
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Introduction the domestic dog embodies one of the largest collections 

of phenotypic diversity for any species living on earth [1]. 

The domestic dog (Canis lupus familiaris), one of our —- Due to their cognitive and behavioral abilities, domestic 

closest companions in the animal kingdom, has followed dogs have been selected to fulfill a wide variety of tasks 

us to every continent of the world. As a single species, including hunting, herding and companionship. The ge- 

netic and historical basis of these phenotypic changes has 

intrigued the scientific community, including Darwin [2]. 
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and in the second stage, the primitive forms were fur- 
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abilities and morphology [3-5]. Despite many efforts 
studying dog evolution, several basic aspects about the 
origin and evolution of the domestic dog are still in dis- 
pute. For example, several different geographical regions 
have been proposed as the birthplace of domestic dogs, 
and the date of divergence between wolves and dogs has 
been estimated between 32 000 years ago and 10 000 
years ago [6-10], with relatively weak gene flows found 
between these two groups since their divergence [4, 6, 7, 
9]. The exact history of dog domestication thus remains 
to be fully resolved [11]. 

The first comprehensive genetic investigations of the 
geographical origin of dogs were based on global intra- 
specific studies of maternally transmitted DNA (mtDNA) 
in contemporary dogs, which gave a strong indication 
that dogs originated in the southern part of East Asia 
[7, 12]. However, several subsequent studies based on 
diverse genetic markers have given discrepant answers. 
For example, using mtDNA from ancient dog samples, 
Thalmann et al. have suggested Europe as the place of 
origin [13]. Likewise, using genome-wide genotyping of 
modern dogs, vonHoldt et al. found high haplotype shar- 
ing between Middle Eastern wolves and dogs, proposing 
the Middle East as the major source of dog diversity [14]. 

Although the datasets and approaches are different in 
these studies, a common drawback of these single nu- 
cleotide polymorphism (SNP) array- and ancient DNA- 
based studies is a lack of samples from southern East 
Asia, thus precluding evaluation of the possible scenario 
that domestic dogs actually originated in this region. In 
addition, the use of a single locus, especially mtDNA, 
can skew the conclusion as it is more malleable by sto- 
chastic and/or selective forces [7, 12, 13]. Thus, the his- 
tory of dog domestication remains enigmatic and highly 
controversial [11]. 

Whole genome sequencing provides a powerful holis- 
tic approach to understanding the evolutionary history of 
a species, and is sufficiently robust in mitigating prob- 
lems such as SNP ascertainment bias or stochastic effects 
acting on a single marker, which have influenced earlier 
studies [15]. In this work, we collected the genome se- 
quences of 58 canids from across the world, including 
samples from southern and northern parts of East Asia, 
Africa, Europe, the Middle East, Siberia and the Ameri- 
cas. Population genetic analysis reveals an ancient origin 
for the domestic dog in southern East Asia about 33 000 
years ago. After evolving for several thousand years in 
East Asia, a subgroup of dogs radiated out of southern 
East Asia about 15 000 years ago to the Middle East, Af- 
rica as well as Europe. One of these out of Asia lineages 
then migrated back to northern China and made a series 
of admixtures with endemic East Asian lineages, before 


traveling to the Americas. Our study, for the first time, 
reveals the extraordinary journey that the domestic dog 
has traveled on this planet during the past 33 000 years. 


Results 


Sample collection and whole genome sequencing 

58 canids from around the world were gathered for 
this study. This collection includes 12 gray wolves from 
across the Eurasian continent, 11 indigenous dogs from 
southern East Asia, 12 indigenous dogs from northern 
East Asia, 4 village dogs from Africa (Nigeria) and a set 
of 19 diverse dog breeds distributed across the Old World 
and the Americas. 

Chinese indigenous dogs are dogs living in the coun- 
tryside of China [16] (Supplementary information, Data 
Sl and Figure S1) and were sampled across the geo- 
graphic range of rural China, including many remote 
regions in Yunnan and Guizhou in southern China (Sup- 
plementary information, Table S1). The breeds include 
dogs from Central Asia (Afghan Hound) and North Afri- 
ca (Sloughi), Europe (eight different breeds), the Arctic 
and Siberia (Greenland dog, Alaska Malamute, Samo- 
yed, Siberian Husky, and East Siberian Laika), the New 
World (Chihuahua, Mexican and Peruvian naked dog) as 
well as the Tibetan Plateau (Tibetan Mastiff). These dogs 
were chosen to cover as many major geographic regions 
as possible (Figure 1A and Supplementary information, 
Table S1). 

After DNA extraction, individual genomes were se- 
quenced to an average of 15x coverage (Supplementary 
information, Table $1). Of the 58 individuals, 4 gray 
wolves and 6 dogs have been sequenced in a previous 
study [10]. DNA sequence analysis was done using the 
Genome Analysis Toolkit [17]. After stringent filtering, 
we identified 20 353 184 SNPs and 3 856 246 small 
indels (Figure 1B), most of which are shared between 
groups. For example, 40.3% of the SNPs are shared be- 
tween wolves, indigenous dogs and dog breeds, reflect- 
ing their recent divergence (Figure 1B). Using Sanger 
sequencing, we verified that the sequencing strategy was 
highly sensitive (false negative rate around 10%) and the 
amount of false positives was less than 5% (Supplemen- 
tary information, Data S2 and Figure S2). 


Genetic diversity and population structure 

Comparison of the two haploid genomes within each 
individual yields the genetic diversity 6 (4 Nu) for the 
58 individuals. As shown in Figure 1C, genetic diversity 
shows a decreasing trend from wolves to Chinese indig- 
enous dogs (preserving 78% of the wolf heterozygosity) 
and subsequently to dog breeds (66% of the wolf hetero- 
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Figure 1 Population structure and genetic diversity of 58 canids. (A) Geographic locations of the 58 canids sequenced in this 
study. (B) Amount of of SNPs and small indels called in this study. (C) Genetic diversity for the 58 canids. AF, African village 
dogs; BEM, Belgian Malinois; CHI, Chihuahua; FIL, Finnish Lapphund; GAL, Galgo; GNE, Gray Norwegian Elkhound; GSD, 
German Shepherd Dog; JAM, Jamthund; LAH, Lapponian Herder; MEN, Mexican naked (hairless); PEN, Peruvian naked 
(hairless); SWL, Swedish Lapphund; AFG, Afghan Hound; SLO, Sloughi; SAM, Samoyed; ESL, East Siberian Laika; SIH, 
Siberian Husky; ALM, Alaska Malamute; GRD, Greenland dogs; TIM, Tibetian Mastiff. (D) Structure analysis of the 58 canids. 
(E) Genetic diversity of the different groups. AF, African village dogs; EB, European breeds; SI, southern Chinese indigenous 
dogs; W, wolves. (F) Linkage disequilibrium patterns for the different groups. (G) Principle component analysis of the 58 ca- 
nids. Inset is for all individuals and the large panel is for dogs only. (H) Principle component plot for a large collection of ca- 
nids together with our data. (I) A clock-like tree (UPGMA) for all the 58 individuals [56]. 
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zygosity), with the African village dogs having a genetic 
diversity comparable to many dog breeds (69% of the 
wolf heterozygosity). Among the dog breeds, the levels 
of variation in genetic diversity are quite dramatic. For 
example, the East Asian breed Tibetan Mastiff and East 
Siberian Laika show levels of diversity comparable to 
the Chinese indigenous dogs, but many of the European 
dog breeds have considerably reduced genetic diversity. 
Such dramatic differences in genetic diversity can be 
influenced both by ancient and recent history of inbreed- 
ing. 

To explore the genetic relationships among these in- 
dividuals, we performed a structure analysis using an 
expectation maximization (EM) algorithm to cluster the 
individuals into different numbers of groupings [18]. 
When partitioning the individuals into two groups, the 
algorithm separates the dogs from the wolves, with very 
limited admixture observed (Figure 1D). Further dividing 
the individuals into three subsets split the dogs into two 
clusters, with indigenous dogs from southern East Asia 
representing one subset and the other subset consisting of 
dog breeds from Europe and South/Central America and 
the African village dogs. Indigenous dogs from northern 
China and dog breeds from the Arctic and Central Asia, 
the Middle East and North Africa show a mixture of these 
components with varying proportions. This observation 
implies that there are two divergent groups of dogs: one 
is East Asian component and the other, non-East Asian 
component. It is important to emphasize that individuals 
with mixed constituents identified in the structure anal- 
ysis are not always due to true admixture events, since 
populations of intermediate genotypes between these two 
groups tend to display mixed components (e.g., originat- 
ed shortly after the split of two clades, Supplementary 
information, Data S3 and Figure S3). Further partitioning 
into four and five groups leads to the separation of the 
African village dogs and the breed dogs from the eastern 
Arctic regions (i.e., Siberian Husky, Alaska Malamute 
and the Greenland dog). 

Genetic diversity among individuals (Figure 1C) may 
be heavily influenced by ancient as well as recent histo- 
ry, e.g., breeding programs during the last few thousand 
years or the past few hundred years. However, combined 
information from multiple breeds may reveal information 
about the ancestral populations that gave rise to them, 
since each breed has experienced separate breeding his- 
tory. We therefore calculated the genetic diversity (0,) for 
the “pure groups” informed by the structure analysis (K = 
4, Figure 1D). As shown in Figure 1E, dog breeds, most 
of which of European origin, carry lower diversity than 
the Chinese indigenous dogs as a group, but have higher 
genetic diversity than the African indigenous dogs. This 


suggests that the ancestral population that gave rise to the 
European breeds was larger than the ancestral population 
of the African indigenous dogs. Linkage disequilibrium 
patterns also show similar trends (Figure IF). 


Principle component and phylogenetic analysis 

When projecting the genotypes into a two-dimension- 
al space using a principle component analysis (PCA) 
[19], all dogs cluster together tightly compared with the 
distribution seen for wolves (Figure 1G, inset). When 
inspecting the distribution among dogs, we find that dogs 
spread along three major geographic axes: southern East 
Asia, Europe and Africa. The northern Chinese indige- 
nous dogs and dog breeds from the Middle East/Arctic 
regions/Tibet fall between these three extremes (Figure 
1G). The observed pattern reflects the overall geographic 
locations of these groups following a clear East-West 
gradient, which matches quite well the observation from 
our structure analysis. 

Combining our dataset with data from a previous SNP 
array study, which included a larger number of samples 
[20], we found that the southern Chinese indigenous dogs 
together with several East Asian dogs (e.g., Chow Chow, 
Akita, Chinese Shar-Pei) are closest to wolves (Figure 
1H). When the phylogenetic relationships among our 58 
samples are inspected, East Asian dogs spread over both 
sides of the deepest node connecting all dogs, while dogs 
from other continental areas coalesce into a subclade and 
then join with East Asian dogs. Thus, East Asian dogs are 
the most basal lineages connecting to gray wolves (Fig- 
ure 11). It is worth pointing out that the genomes of dogs 
from Oceania (dingoes and New Guinea singing dogs), 
although being closer to wolves in the PCA plot (Figure 
1H), bear strong signals of admixture with gray wolves 
[6], which likely reflects their past history of admixture, 
before they migrated to Australia and New Guinea (Sup- 
plementary information, Data S4 and Figure S4). 


Admixture analysis 

Using the joint allele frequencies among all popula- 
tions in our study, we infer the split and admixture histo- 
ry among groups of populations using TreeMix [21]. If 
migration tracks are not allowed, then the relationships 
inferred from the TreeMix analysis (Figure 2A) directly 
reflect the patterns observed in our previous analyses 
including the structure (Figure 1D), the phylogenetic 
(Figure 11) and the principal component analyses (Figure 
1G). Thus, following the divergence between contempo- 
rary wolves and domestic dogs, the first partition within 
dogs is between the southern Chinese indigenous group 
and all other dogs. This is then followed by branching 
of the other dogs, largely matching the geographical dis- 
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tance from southern East Asia: first, dogs from Central 
Asia, northern China, and eastern Arctic, followed by 
dogs in Africa, the Middle East, and western Arctic, and 
the final group including all dog breeds in Europe and 
South/Central America. 

If migration tracks are allowed in TreeMix, there is 
strong statistical support for migrations among a few 
groups: (1) northern Chinese indigenous dogs show 
strong admixture from European dogs (Figure 2A and 
Supplementary information, Data S5, Figure S5, Tables 
S2 and S3); (2) gene flow from wolves to the African/ 
Middle Eastern dogs (Supplementary information, Figure 
S5); (3) migratory tracks from the southern Chinese dogs 
to the eastern Arctic group (i.e., Siberian Husky, Alaska 
Malamute and the Greenland dog; Supplementary infor- 
mation, Figure S5). When all possible migration events 
in the history of these samples are examined using the 
F3/F4 test [22], there is again a strong statistical support 
for all the migration events listed above (Supplementary 
information, Data S5). 


Long-term evolutionary trajectories for wolves and dogs 

Using the divergence between the two haploid ge- 
nomes within individuals, the pairwise sequentially 
Markovian coalescent (PSMC) model provides a method 
for investigating the long-term trajectories in popula- 
tion sizes [23]. To translate demographic history into 
real-time units, estimation of an accurate mutation rate 
is very important. Previously, several different mutation 
rates were used, but they were generally not carefully 
calibrated (Supplementary information, Data S6) [24]. 
Using multiple outgroup species to the dog (e.g., horse 
and cat), our estimate of the mutation rate for the lin- 
eage leading to the domestic dog is 2.2 x 10” per site 
per year (Supplementary information, Data S6 and Table 
S4), a rate similar to those from several earlier studies 
[25, 26]. Using this mutation rate, we estimate dates for 
the population history of dogs and wolves. As shown in 
Figure 2B, a decrease in the size of the ancestral wolf 
population started to occur 2 million years ago, reaching 
a saddle point about 3-400 000 years ago. The ancestral 
population then increased in size, peaking at around 
200 000 years ago. After a subsequent small decline in 
population size, wolves and dogs started to diverge from 
each other between 20 000 and 100 000 years ago (see 
next section for a more precise dating). Although all do- 
mestic dogs drastically decreased in population size after 
the population split, the wolf population experienced a 
slight growth, possibly as a consequence of the megafau- 
na extinctions (i.e., late Quaternary extinction) [27] that 
provided gray wolves with better food resources due to 
reduced competition from other predators. 
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Time of divergence between contemporary wolves and 
dogs 

Treemix and phylogenetic analyses identified southern 
Chinese indigenous dogs as the most basal population 
compared to wolves, from which all other dog popula- 
tions diverged. We therefore used joint allele frequencies 
between the 12 gray wolves and the 11 southern Chinese 
indigenous dogs, to infer the demographic history for 
these two populations with the dadi package [28]. Similar 
to the result from the PSMC analysis, the wolf popula- 
tion experienced a very mild population growth (1.26-fold 
increase) that started around 290 000 years ago (Figure 
2C). The time of divergence for the wolf and dog popu- 
lations is inferred to be around 33 000 years ago, where 
the domestic dog lineage expanded from a population of 
4 600 individuals to about 17 500. 

In addition to gauging changes in population size, sta- 
tistical methods can also estimate the rates of exchange 
of migrants between two populations. The migration 
rate (2Nm) from the dog lineage to the wolf lineage is 
estimated to be 0.97, while the other direction (wolves to 
dogs) is inferred to be 5.02, showing a clear asymmetry 
in the migration rates [29]. 

Examination of the sequence divergences between the 
multiple populations using a Markov chain Monte Carlo 
(MCMC) approach [30, 31] (Supplementary information, 
Data S7, Figures S6-S8, Tables S5 and S6) reveals a 
similar profile for the history between wolves and dogs, 
which includes a slight growth in the wolf population and 
an ancient divergence between wolves and dogs (Supple- 
mentary information, Data S7 and Table S5). In summa- 
ry, multiple levels of genetic information (i.e., both joint 
site frequencies as well as sequence divergence) support 
an ancient split between dogs and wolves. 


The geographical origins of dogs: a single origin in 
southern East Asia 

In order to identify the most probable geographical 
origin of dogs, we hypothesized that similar to many 
organisms, the geographical origin of a species holds 
the greatest genetic diversity, and the global relationship 
among multiple populations will, in the absence of strong 
influence of admixture, follow a serial founder model [32, 
33]. In the case of dogs, the wild ancestor, the wolf, has 
been present along the dog throughout Eurasia, implying 
that intense dog-wolf admixture could possibly have in- 
fluenced this pattern. 

Despite the concern on the confounding effect of wolf/ 
dog gene flow, the TreeMix analysis, F3/F4 test as well 
as the demographic analysis suggest that gene flow be- 
tween dogs and wolves is relatively mild. In Supplemen- 
tary information, Data S8, we review the evidence for 
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dog/wolf gene flow from our study, as well as from mul- 
tiple previous studies. The combined evidence shows that 
the migration rates (2Nm) are mostly around one or less 
(a maximum of five found in the dadi analysis) and that 
the admixture proportion is normally around 10%, with 
a maximum of 16% for the Middle East (Supplementa- 
ry information, Data S8). Low levels of migration are 
detected between wolves and dogs across Eurasia when 
the very sensitive D test is used [34, 35] (Supplementa- 
ry information, Data S8). Thus, we conclude that while 
dog-wolf gene flow has occurred throughout history of 
the domestic dog, it has been at a moderate level and the 
level of admixture has been relatively similar across Eur- 
asia (Supplementary information, Data S8). Without the 
strong influence of admixture [32], we may assume that 
genetic diversity is highest at the place of origin and that 
the global relationship among the multiple populations 
follows a serial founder model reflecting their dispersal 
routes [33]. 

It is tempting to draw conclusions about the origin 
of dogs from the high genetic diversity observed in the 
Chinese indigenous dogs. However, comparing breed 
dogs with indigenous dogs at the individual level is like- 
ly misleading since most of the differences in genetic 
diversity are probably caused by recent bottleneck events 
rather than their distant origin [1]. Thus, we combine 
multiple breeds in each region as a group representing 
the ancestral haplotype pool giving rise to the contem- 
porary dogs of that region. Our analysis shows that dogs 
from East Asia have the highest genetic diversity (Figure 
1E). This suggests that the ancestral population that gave 
rise to East Asian dogs was much larger than ancestral 
populations in other regions (e.g., Europe). The linkage 
disequilibrium pattern also shows the same trend (Figure 
1F). Higher levels of genetic diversity in East Asian dogs 
are also observed in mtDNA and Y chromosome data [7, 
12, 36]. 

Beside group diversity, in the phylogenetic and Tree- 
Mix analyses, the deepest node connecting all dogs 
separates into two clades, one of which is composed of 
only East Asian dogs, while the other clade includes both 
East Asian and non-East Asian dogs (Figures II and 2A, 
and Supplementary information, Figure $5). Dogs from 
Africa and Europe share a most recent common ancestor, 
which then coalesces with dogs from East Asia (Figures 
lI and 2A). Notably, this basal position of East Asia is 
robust to the levels of migrations between wolves and 
dogs (Supplementary information, Data S9, Figure S9, 
and Table S7). The basal position of East Asian dogs is 
similar to the pattern observed for Africans within human 
populations [37]. 

In addition to the observations based on group level 


diversity and the basal phylogenetic position, the PCA 
pattern also provides supporting evidence for the south- 
ern East Asian origin of dogs. As the amount of genetic 
drift in basal groups is typically lower due to their larger 
population sizes, we expect them to display a closer ge- 
netic relationship with wolves in the PCA plot (Figure 
2A). When we simulate a serial founder model that mim- 
ics the history of dog domestication, we can easily gener- 
ate a pattern that is similar to that shown in Figure 1G (see 
also Supplementary information, Data S10 and Figure 
S10). Thus, in our analysis, we find dogs with ancestry 
in southern East Asia to be closest to wolves, and also a 
geographical distribution of the populations following a 
clear east-west gradient, indicating serial founder events. 
It is important to emphasize that admixture between 
wolves and dogs is unlikely to have created the observed 
pattern, given that the dog-wolf admixture rate in East 
Asia is not higher than that seen in other regions (Sup- 
plementary information, Data S8). 

Having identified southern East Asia as the likely or- 
igin of dogs, we asked whether the domestic dog may 
have originated in more than one region through separate 
domestication events. In order to test whether multiple 
origins are compatible with the observed data, we per- 
formed simulations mimicking different scenarios (Sup- 
plementary information, Data S11 and Figure $11). Our 
results show that, if there were multiple origins for dogs 
from separate wolf populations, the descendant popula- 
tions would tend to reside in separate clusters in the PCA 
plot, which is in contrast to what we observe (Figure 1G, 
inset). Thus, that the domestic dog originated multiple 
times in different geographical areas is not compatible 
with the observed genetic patterns found in our genome 
data. 


The out of southern East Asia history for the domestic 
dog 

To study the subsequent global history of the dog, 
we used an MCMC approach to date several important 
transitional points among the major clades (Figure 2A). 
Our analysis supports the split between the southern Chi- 
nese indigenous dogs and all other dogs across the world 
around 15 000 years ago, thus indicating a radiation of 
dogs out of southern East Asia earlier than the origin of 
agriculture (Supplementary information, Data S7 and 
node 2 in Figure 2A and 2D) [38]. After radiating from 
southern East Asia, possibly following existing human 
settlements at the time (Supplementary information, Data 
S12 and Figure $12), the out of southern East Asia lin- 
eage spread to the Middle East/Africa and arrived in Eu- 
rope by about 10 000 years ago (Supplementary informa- 
tion, Data S7; node 3 in Figure 2A and 2D). Notably, one 
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Figure 2 Demographic and migration histories for the domestic dog. (A) Tree topology inferred from TreeMix when no migra- 
tory tracts are allowed. The drift parameter is the amount of genetic drift along each population. Further inferred migratory 
tracts are shown in the bottom-left corner of the panel. The three important nodes are those that we have provided extensive 
dating information. (B) The PSMC plot for all the individuals. Gray lines plot the benthic 50" levels, which are a proxy for 
global temperature [61]. The span of the current ice age (Quaternary ice age, 2.58M-now) is shown with an arrow. The x-axis 
is time plotted in log scale and the y-axis is effective population size. (C) Inferred population demographic history between 
wolves and southern East Asian indigenous dogs using the joint site frequency spectra. (D) A proposed migratory history for 
domestic dogs across the world based on the evidence from our study. Solid arrows represent migratory tracts that we have 
dating information, while dashed arrows indicate those without accurate dating. 


of the out of southern East Asia lineages migrated back 
to northern China, meeting endemic Asian lineages that 
had spread from southern East Asia and yielding a series 
of admixed populations, including the northern Chinese 
indigenous dogs and the Arctic dog breeds (Figure 2A 
and 2D). 

Several dog breeds from South and Central America 
(i.e., Chihuahua, the Mexican and Peruvian naked dog) 
show no signs of admixture, while the Arctic breeds, 
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Alaska Malamute and the Greenland dog, display exten- 
sive admixture from the southern Chinese Indigenous 
lineage [39]. Possibly, this reflects that the human colo- 
nization of the New World occurred in several waves, in 
which dogs may have followed in different time periods 
[40] (Figure 2D). Using the patterns of the admixture 
tracks, we estimate that the time of the admixture for 
the northern Chinese indigenous dogs was quite ancient 
(around 10 500 years ago, Supplementary information, 
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Data S13 and Figure $13) [40]. The relatively recent 
origin of European dogs (i.e., ~10 000 years) together 
with this rather ancient admixture suggests that multiple 
lineages travelled to the Far East from the Middle East/ 
Europe. 


Population structure among wolves 

Our structure and principal component analyses do 
not reveal any population substructure among the gray 
wolves collected for this study (Figure 1D). The high 
migratory ability of the gray wolf might allow the popu- 
lations to remain highly homogenous across the eastern 
part of Eurasia [41]. A previous study using wolves from 
the Middle East (Israel), Europe (Croatia) as well as Chi- 
na found genetic differentiation among these wolf pop- 
ulations [6]. When these three individuals are overlaid 
on the large PCA plot, the wolves from western Eurasia 
do not group together with the wolves we collected from 
eastern Eurasia, and they are genetically closer to dogs 
(Supplementary information, Data S14 and Figure S14). 
Given the fact that Middle Eastern wolves generally have 
more dog admixture [6], the observed difference might 
not represent true population differentiation among 
wolves. Nevertheless, it is possible that some wolves 
have recently diverged from each other [8], as there is 
weak isolation between the wolves from eastern and 
western Eurasia. Explicit testing for potential admixture 
between wolves and dogs sampled in our study finds ev- 
idence of gene flow between wolves and local dog popu- 
lations in each region, albeit the magnitude is low (Sup- 
plementary information, Table S8). Further study on the 
genetic and geographic relationships between dogs and 
wolves is one of the important tasks for the community. 


Domestication genes 

Our analyses indicate that the Chinese indigenous 
dogs represent an intermediate form between wolves 
and breed dogs, and they have not experienced intense 
artificial selection. Analyses of Chinese indigenous dogs 
therefore allow us to stratify the domestication process 
in dogs, and investigate the role of positive selection that 
occurred specifically during the first stage of domesti- 
cation. Using a statistical method that explicitly models 
selective sweeps [42], we have identified the top 1% of 
the genome bearing strong statistical evidence of positive 
selection in the southern Chinese indigenous dogs. In 
Table 1, we list the categories of genes that show statis- 
tical significance by a gene enrichment-based analysis. 
Groups of genes showing the strongest evidence of posi- 
tive selection are those related to metabolism and motili- 
ty, neurological process and perception as well as sexual 
reproduction (Table 1 and Supplementary information, 


Data S15, Tables S9 and S10). Genes that seem to have 
been positively selected in subsequent evolutionary steps, 
including dog breed formation, are related to the control 
of developmental processes and to metabolism (see a full 
discussion of candidate genes involved in transforming 
wild wolves to dogs in Supplementary information, Data 
S15). 

Among the candidates as positively selected genes 
in the first stage of dog domestication, a class of genes 
are related to memory and long-term potentiation (LTP), 
which is widely considered to be the major cellular 
mechanism underling learning and memory [43]. For 
example, GRIA1 (glutamate receptor, ionotropic, AMPA 
1) is an important protein that mediates excitatory synap- 
tic transmission in the central nervous system and plays 
a key role in hippocampal synaptic LTP and long-term 
depression (LTD). Interestingly, a suite of other genes, 
including GRIN2A (glutamate receptor, ionotropic, 
N-methyl D-aspartate 2A), are also found to be heavily 
involved in LTP and LTD (Table 1). The large physiolog- 
ical and behavioral changes empowered by these genes 
may have enabled the transformation of gray wolves to 
domestic dogs, allowing them to flourish in the human 
environment. 


Discussion 


Based on genome sequences from a worldwide collec- 
tion of dogs, especially a large collection of indigenous 
dogs from southern East Asia, this study provides strong 
genetic evidence that the domestic dog originated in 
southern East Asia. The analyses give a coherent picture, 
where the indigenous dogs in southern East Asia or East 
Asia in general stand out compared to other populations, 
with higher genetic diversity as a group, and occupying 
a basal position next to wolves. Other dog populations 
show progressive ancestry gradient away from wolves 
starting from southern East Asia. Notably, these findings 
corroborate earlier work based on mtDNA and Y-chro- 
mosomal DNA [7, 36]. Thus, studies based on compre- 
hensive global samples and diverse types of genetic data 
(e.g., autosomes, Y chromosome, mtDNA) converge on 
the same story about the origin of the domestic dog. 

The origins of the global domestic dog populations 
can be traced to two important demographic steps: first, 
dog and wolf populations started to diverge from each 
other 33 000 years ago in southern East Asia (matching 
several previous findings [8,10]). Subsequently there 
was a global dispersal of dogs out of southern East Asia 
around 15 000 years ago. The long persistence of the 
domestic dog lineage in southern East Asia opens up for 
interesting scenarios. One possible explanation for the 
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Table 1 Gene ontology analysis of genes selected during the first stage of dog domestication 


GO category number of genes P value 
Metabolism and motility 

ATPase activity’ 19 0.0056 
ATP binding! 55 0.0192 
Actin binding’ 17 0.0195 
Nucleoside triphosphate metabolic process” 9 0.0208 
ATP metabolic process” 8 0.0211 
Phasic smooth muscle contraction 3 0.0315 
Ribonucleotide metabolic process” 9 0.0385 
Purine nucleoside triphosphate metabolic process” 8 0.0399 
ATPase activity, coupled! 14 0.0402 
mRNA metabolic process” 17 0.0423 
Receptor metabolic process” 4 0.0448 
Drug metabolism’ 8 7.82E-04 
Metabolism of xenobiotics by cytochrome P450° 7 0.0031 
ABC transporters’ 6 0.0056 
Glutathione metabolism’ 6 0.0075 
Retinol metabolism’ 5 0.0452 
Neurological process and perception 

Memory 6 0.0034 
Regulation of sensory perception’ 3 0.0315 
Regulation of sensory perception of pain’ 3 0.0315 
Learning or memory” 8 0.0352 
Regulation of neurotransmitter levels” 6 0.0359 
Long-term memory” 3 0.0446 
Synaptic transmission” 14 0.049 
Long-term potentiation’ 6 0.0291 
Sexual reproduction 

Germ cell development’ 7 0.0481 


‘Molecular function; Biological process; *KEGG pathway 


33 000-year deep divergence between dogs and wolves 
is that it represents a split among wolf populations, and 
that South Chinese wolves (ancestors to the dog) were 
genetically differentiated from the more northern wolves 
sampled in our study. In this case, the global expansion 
of dogs out of southern East Asia around 15 000 years 
ago may correspond with the origins of actual domes- 
tic dogs. This scenario is contradicted by the fact that 
wolves in our study display no apparent genetic substruc- 
ture (Supplementary information, Data $14). An alter- 
native scenario is that the ancient dog-wolf split actually 
constitutes the first step in the domestication of wolves 
and evolution to domestic dogs. It is possible that the 
ecological niche unique in southern East Asia provided 
an optimal refuge for both humans and the ancestors of 
dogs during the last glacial period (110-12k years ago, 
with a peak between 26 500 and 19 000 years ago) [44]. 
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The mild population bottleneck in dogs suggests that dog 
domestication may have been a long process that started 
from a group of wolves that became loosely associated 
and scavenged with humans, before experiencing waves 
of selection for phenotypes that gradually favored stron- 
ger bonding with humans (a process called self-domesti- 
cation) [1]. That among the candidate genes as positively 
selected are genes involved in the neurological processes 
may be a manifestation of this dynamic process (Sup- 
plementary information, Data S15). After this long-term 
nurturing, humans and dogs might have eventually come 
together with a strong bond for each other. Thus, the his- 
tory of dogs might involve three major stages: (a) loosely 
engaged pre-domesticated scavengers, (b) domesticated 
non-breed dogs with close human-dog interactions, (c) 
breed formation following intense human selection for 
diverse sets of phenotypic traits. The study of Chinese 
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indigenous dogs thus provide missing links that connect 
these three major stages [45, 46]. 

The exact time when dogs reached the Middle East 
is difficult to estimate with our sample since the Mid- 
dle Eastern dogs (and also African dogs) bear relatively 
strong signals of introgression from wolves (Figure 2A). 
However, demographic inferences suggest that dogs had 
arrived in Europe by about 10 000 years ago (Figure 2D 
and Supplementary information, Data S7), a short time 
after the origin of agriculture in the Middle East [38]. It 
is notable that the global spread of dogs around 15 000 
years ago corresponds well with the generally accepted 
earliest archaeological evidence of dogs across Eurasia 
[11]. As there is little evidence of westward human mi- 
grations from southern East Asia around 15 000 years 
ago, the initial spread of the domestic dog out of Asia 
may in part have been a self-initiated dispersal driven by 
environmental factors (e.g., the retreat of the glacial cov- 
erage that started about 19 000 years ago). The specific 
route domestic dogs used to migrate to the Middle East, 
Africa and Europe remains to be uncovered (Figure 2D 
and Supplementary information, Data $12). Some of this 
dispersal might be heavily influenced by humans, as dogs 
were often part of the civilization package that traveled 
together as agriculture spread [47] (Figure 2D). Further 
studies using samples from western Eurasia should re- 
veal insights into these early dog migrations [6]. 

Despite the strong patterns presented by the genetic 
data, archaeological evidence supporting an East Asian 
origin is missing [11]. Several important factors further 
confound current analysis. First, the morphological dif- 
ferences between dogs and gray wolves are not always 
very clear-cut, especially for specimens from the early 
phase of dog domestication [48]. In fact, a recent an- 
cient DNA study has ruled out several ancient dog-like 
specimens found in Europe [13]. Second, archaeological 
studies in the Far East are generally lagging behind those 
in Europe, with most of the ancient dog-like fossils from 
before 12 000 years ago being found outside of East Asia 
[11]. This could also be due to the unfavorable environ- 
mental conditions for preserving fossils in southern East 
Asia. Nevertheless, it is possible that multiple primitive 
forms of the dog existed, including in Europe [13, 49]. 
However, in this case, the genetic pattern presented here 
shows that those lineages were replaced by dogs that mi- 
grated from southern East Asia, and thus made negligible 
contributions to the modern dog gene pool (Figure 1D). 

This study opens many potential avenues for future 
research (Figure 2D). For example, the history of the 
American colonization and the scale of wolf-dog admix- 
ture in the Middle East and Africa remain largely unex- 
plored, especially given the limited coverage of our Af- 


rican samples [50]. Analysis of additional samples from 
other parts of the world (especially the Indian coastal 
region and northern Eurasia as well as Africa) should al- 
low us to draw a more complete picture of the worldwide 
migration patterns, and their association with human 
populations. Comprehensive analyses of ancient canid 
genomes will provide genetic information from multiple 
time points for elucidating the initial steps of dog history, 
and identifying putative population replacements that 
may have influenced modern day dog’s gene pool [8]. 

The study of Chinese indigenous dogs has provided an 
unprecedented opportunity for illuminating the history 
of selection during dog domestication. For example, the 
initial selection on the domestic dog is found be strongly 
associated with an enrichment of genes affecting behav- 
ior and motility. As dogs established stronger bonds with 
humans, possibly empowered by the origin of modern 
agriculture in the Middle East and China [51], strong 
selection for genes involved in metabolism and morphol- 
ogy/development emerged (Supplementary information, 
Data S15). Our study, for the first time, begins to reveal 
a large and complex landscape upon which a cascade of 
positive selective sweeps occurred during the domesti- 
cation of dogs. The domestic dog represents one of the 
most beautiful genetic sculptures shaped by nature and 
man. 


Materials and Methods 


Sample collection and sequencing 

Total genomic DNA was extracted from the blood or tissue 
samples of the animals using the phenol/chloroform method. For 
each individual, 1-3 ug of DNA was sheared into fragments of 
200-800 bp with the Covaris system. DNA fragments were then 
processed and sequenced using the Illumina HiSeq 2000 platform. 


Sequence data pre-processing and variant calling 

Raw sequence reads were mapped to the dog reference genome 
(Canfam3) using the Burrows-Wheeler Aligner (BWA) [52]. 
Sequence data were next subjected to a strategic procedure for 
variant calling using the Genome Analysis Tool Kit (GATK) [17]. 
During base and variant recalibration, a list of known SNPs/indels 
downloaded from the Ensemb! database were used as the training 
set. Small indels were separately called using SAMtools mpileup 
[53]. 


Genetic diversity, linkage disequilibrium and structure 
analysis 

Beagle was used to impute the missing genotypes and phase of 
the genotypes into the associated haplotypes [54]. Genetic diver- 
sity for each individual, as well as for several sub-groupings, was 
calculated using a custom python script. Linkage disequilibrium 
for the different populations was calculated using the haploview 
software [55]. Population structure analysis was done using the 
EM algorithm implemented in the Frappe package [18]. Principle 


Cell Research | Vol 26 No 1 | January 2016 


component analysis was carried out using the smartPCA program 
from the Eigensoft package [19]. Unweighted Pair Group Method 
with Arithmetic mean (UPGMA) tree was built based on the ge- 
netic distances calculated from whole genome data [56]. 


Estimation of mutation rate from between species compari- 
SONS 

Multiple species alignment data were downloaded from the 
Ensemb! database. We used human as the outgroup and chose a 
second species (cat, horse or cattle) as the sister species to the dog. 
For each possible sister species, we did a three species comparison 
(human, (dog, sister_species)) by extracting information from the 
multiple species alignments. Branch lengths along the dog lineage 
were estimated using the baseml package from the PAML package 
[57]. Long-term evolutionary rate along the dog lineage was then 
calculated using the branch length divided by the divergence time 
between the sister species and the dog. 


Population admixture and demographic analysis 

Population level admixture analysis was first carried out using 
the TreeMix program [21]. The threepop/fourpop module from the 
TreeMix package was used to perform the F3/F4 test [22]. PSMC 
model was used to estimate the population histories from the in- 
dividual genomes [23]. Since sequence coverage is an important 
factor in determining the inferred population sizes, a correction 
factor was invoked to correct for false negatives in SNP calling 
(Supplementary information, Data S2). 

The joint site frequency spectrum between wolves and the 
southern Chinese indigenous dogs was used to infer the population 
history using the dadi package [28]. Lineage specific substitution 
matrix was first estimated using the ambiore package [58] with the 
whole genome sequence alignments between the outgroup (dhole) 
(Supplementary information, Table S1) and the dog genome. A 
corrected site-frequency spectra (SFS) was then used to perform 
the demographic inference. 

Since the ancestral population of wolves might not have been 
at equilibrium, we allowed the wolf population to change continu- 
ously from an equilibrium population at some time in the past (T 1). 
During the continuous change (i.e., from T1 to now), at some more 
recent time T2, the dog population split off, and started to change 
its size continuously from an initial size (S1) to an end size (S2) 
(Figure 2C). 

Bayesian analysis of the species evolutionary history was con- 
ducted using both the BPP and G-PhoCS package independently 
on noncoding sequences extracted from the polymorphism data [30, 
31]. Population admixture time was estimated using the HAPMIX 
program [40]. We used southern Chinese indigenous dogs and 
breed dogs as the two source populations for the northern Chinese 
indigenous dogs. The genetic distances between SNPs were ex- 
tracted from a previous published genetic map [59]. The overall 
admixture time is inferred by maximizing the likelihood combin- 
ing the likelihood values from all individuals. 


Targets of positive selection 

The SweepFinder algorithm was used to extract regions of the 
genome that show the strongest signals of positive selection [42]. 
The genome-wide site frequency spectrum is used as the back- 
ground site frequency distribution before fitting a sweep model to 
the data. Gene Ontology (GO) analysis was carried using DAVID 
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[60]. 

For detailed Materials and Methods see Supplementary infor- 
mation, Data S16. A separate reference list for Supplementary 
information is provided at the end of Supplementary information, 
Data S16. 
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